<div> Joint recognition of text and layout in historical Russian documents</div> Open database of scientific publications ITMO UNIVERSITY

Joint recognition of text and layout in historical Russian documents

Journal

Scientific and Technical Journal of Information Technologies, Mechanics and Optics

Mohammed Samah, Teslya Nikolay

UDK004.932.75

Issue:3 (149)

Download PDF0 Kbyte

Annotation

In this paper, we evaluated the Document Attention Network (DAN), the first end-to-end segmentation-free architecture on Historical Russian Documents. The DAN model jointly recognizes both text and layout from whole documents, it takes whole documents from any size as an input and output the text as well as logical layout tokens. For comparison purposes, we conduct our experiments on Digital Peter dataset as it has been recognized at line-level. Dataset consists of documents of Peter the Great manuscripts; ground truths are represented according to a sophisticated XML schema which enables an accurate detailed definition of layout and text regions. We achieved good results at page-level: 18.71 % for Character Error Rate (CER), 39.7 % for Word Error Rate (WER), 14.11 % For Layout Ordering Error Rate (LOER), and 66.67 % for mean Average Precision (mAP).

Joint recognition of text and layout in historical Russian documents

Scientific and Technical Journal of Information Technologies, Mechanics and Optics

Annotation

Keywords

Постоянный URL

Articles in current issue

Joint recognition of text and layout in historical Russian documents

Scientific and Technical Journal of Information Technologies, Mechanics and Optics

Annotation

Keywords

Постоянный URL

Поделиться

Articles in current issue